Using Unsupervised Feature-Based Speaker Adaptation for Improved Transcription of Spoken Archives
نویسندگان
چکیده
This paper deals with unsupervised feature-based speaker adaptation techniques. The goal is to design an optimal adaptation approach for improving the recognition accuracy of a LVCSR system developed for automatic transcription of large archives of spoken Czech (e.g. the archive of the parliament talks, historical archives of Czech broadcast stations, etc.) For this purpose, several modifications of VTLN and CMLLR techniques were investigated and combined together. Our study focuses on the application of the adaptation methods in the recognition process as well as in building a normalized acoustic model within the speaker adaptive training scheme. The methods were evaluated experimentally on a large amount of various data (with total number 93k words). The resulting two-step adaptation scheme yields a significant WER reduction from 17.8 % to 14.8 %.
منابع مشابه
Unsupervised speaker indexing using anchor models and automatic transcription of discussions
We present unsupervised speaker indexing combined with automatic speech recognition (ASR) for speech archives such as discussions. Our proposed indexing method is based on anchor models, by which we define a feature vector based on the similarity with speakers of a large scale speech database. Several techniques are introduced to improve discriminant ability. ASR is performed using the results ...
متن کاملExplorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis
In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a wordbased large-vocabulary continuous speech recognizer...
متن کاملDiscriminative MCE-based speaker adaptation of acoustic models for a spoken lecture processing task
This paper investigates the use of minimum classification error (MCE) training in conjunction with speaker adaptation for the large vocabulary speech recognition task of lecture transcription. Emphasis is placed on the case of supervised adaptation, though an examination of the unsupervised case is also conducted. This work builds upon our previous work using MCE training to construct speaker i...
متن کاملAutomatic Transcription of Discussions Using Unsupervised Speaker Indexing
We present unsupervised speaker indexing combined with automatic speech recognition (ASR) for speech archives such as discussions. Our proposed indexing method is based on anchor models, by which we define a feature vector based on the similarity with speakers of a large scale speech database, and we incorporate several techniques to improve discriminant ability. ASR is performed using the resu...
متن کاملImproved histogram-based feature compensation for robust speech recognition and unsupervised speaker adaptation
Feature compensation for noise robust speech recognition becomes more effective if normalization of time-derivative parameters is taken into account. This paper describes an implementation of Delta-Cepstrum Normalization (DCN) that runs with only minimum response time. The proposed algorithm, referred to as Recursive DCN, provides word error rate improvements comparable to conventional DCN. Sin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011